2,989 research outputs found

    One-dimensional and multi-dimensional substring selectivity estimation

    Full text link
    With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). We then develop and analyze two 1-D estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given PST. For the k -D problem, we first generalize PSTs to multiple dimensions and develop a space- and time-efficient probabilistic algorithm to construct k -D PSTs directly. We then show how to extend MO to multiple dimensions. Finally, we demonstrate, both analytically and experimentally, that MO is both practical and substantially superior to competing algorithms.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42330/1/778-9-3-214_00090214.pd

    Effects of substituting rare-earth ion R by non-magnetic impurities in R2BaNiO5R_2BaNiO_5 - theory and numerical DMRG results

    Full text link
    In this paper we study the effect of substituting R (rare-earth ion) by non-magnetic ions in the spin-1 chain material R2BaNiO5R_2BaNiO_5. Using a strong-coupling expansion and numerical density matrix renormalization group calculations, we show that spin-wave bound states are formed at the impurity site. Experimental consequences of the bound states are pointed out.Comment: 5 pages, 4 postscript figure

    LOF: Identifying density-based local outliers

    Get PDF
    For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using realworld datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical
    • …
    corecore